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1  .         INTRODUCTION 

The    purpose   of   this    study    is    to    investigate    a    use   of   time 
series    in    improving   weather    forecasting. 

Let    us    consider   a    simple   model.      Let    us    assume   that    the 
relevant    variables,    or    functions,    such    as,    say,    the    500    mb    height 
field,    can    be   expressed    in    terms    of   a    set   of   orthogonal    basis 
functions,    whose    coefficients    define   the    field.      There    is    also    an 
associated    numerical    integration    program   which   yields    predictions 
of   this    variable    in    the    form   of   predicted    coefficients.      This    method 
of   forecasting    is    assumed    to    be   more    reliable    than    any    regression 
scheme.      At    regular    intervals    some   observations    of   the    variable 
are   made    and    the    predicted    coefficients    are    then    updated,    or 
corrected . 

These    corrections    to    the    coefficients    in    the    forecast    are 
analyzed    for    trends.      The    corrections    for    each    (complex)    coeffi- 
cient  of   a    (complex)    ei genf uncti o n    or    basis    function    are    analyzed 
separately.      The    purpose    is    to    predict    the    corrections. 

The    basic    approach    is    to    define   a    best-fitting    difference 
equation    and    a    best    set    of    initial    conditions,    by   minimizing 
weighted    sums    of   squares    of   the    residuals,    and    then    use    the 
difference    equatio.n    and    the    initial    values    to    predict    the    correc- 
tion.     These    are    used    in    a    secondary    routine    to    improve    the    forecast 

Testing   of    the   method    in    the    500    mb    field   with    a    simple 
non-divergent    model    due    to    Bourke    (1972)    shows    skill    with    some 
coef fi  ci  ents . 


2.    OUTLINE  OF  THE  PROBLEM 

Let  us  assume  that  we  have  some  function,  or  functions,  which 
we  are  representing  in  terms  of  an  orthonormal  set  of  basis 
functions.   In  particular,  let  us  consider  the  500  mb  height 
function,  or  perhaps,  the  stream  function,  expressed  on  the  surface 
of  the  earth  in  terms  of  spherical  harmonics.   These  are  expressed 
as  complex  trigonometric  functions  of  longitude  and  Legendre 
functions  of  the  sine  of  the  latitude.   At  time  tN  the  coefficients 
of  the  spherical  harmonics  form  a  vector  or  column  matrix  AN;  its 
components  are  complex  numbers  a.  N ,  k=l ,  ...,  K,  if  the  series 
is  truncated  to  K  terms.   The  vectors  AM ,  and  each  of  the  components 


N 


a.  N,  form  a  time  series,  or  sequence. 


The  sequence  is  generated  conceptually  as  follows.   Let  us 

assume  that  at  time  tN_,  we  have  the  matrix  AN_,  of  coefficients 

for  the  analyzed  field.   This  is  considered  to  represent  the  true 

field  at  that  time.   We  have  also  an  integration  procedure  by  which 

we  can  generate  predictions  AN_,  N  ,  AN_,  N+, ,  ...,  for  the  future 

values  AM,  A..,-,,  ...,  as  many  as  we  feel  are  useful  or  reliable. 
N    N  +  l  J 

Then  at  time  tN,  12  hours  later,  observations  are  made.   These  are 
combined  with  the  predictions  A.,,  N  toyield  a  new  analyzed  field, 
and  a  new  set  of  coefficients  A...   The  corrections  (changes, 
errors,  or  discrepancies), 

Z   =  A   -  A 


n  =  2,  3,  . . . ,  N, 


(2.1) 


n    n    n-1  ,n 
also  define  a  time  series  or  a  sequence.   If  we  could  in  some  way 

analyze  this  sequence,  Z,,,  Z_,  ..,  Z..,  and  from  it  predict 

ZN  +  i  '  zm  +  ?'  •'•'  ZN+   reliably  then  we  could  predict  ahead  m  steps 


reliably.   Our  purpose  is  to  predict  more  accurately,  or  more 

reliably,  and  perhaps  to  extend  the  range  of  prediction.   We  shall 

see  that  there  are  two  ways,  at  least,  that  this  improvement  might 

be  effected.   Both  depend  on  the  use  of  time  series  as  developed 

in  the  next  sections  . 

The  Z's  and/or  their  components  form  the  data  for  the  analysis. 

As  such  they  are  called  observations  (not  to  be  confused  with 

meteorological  observations).   The  terms  component  and  coefficient 

are  used  interchangeably.   Vectors  have  components  and  basis 

functions  have  coefficients  when  they  are  used  in  the  representation 

of  another  function.   The  components  a.    of  A  ,  for  example,  are 

K        k  ,n     n  K 

the  coefficients  in  the  representation  of  some  meteorological 
variable.   Basis  functions  are  often  called  modes,  particularly  in 
mechani  cs . 


3.    INTRODUCTORY  TIME  SERIES,  SIMPLIFYING  ASSUMPTIONS 

Let  us  consider  the  sequence  of  corrections,  Z,  n  =  2,  3,  ...,  N 

The  vector  Z   has  K  complex  components.   There  is  one  complex  number 
n 

z,    for  each  of  the  K  complex-valued  spherical  harmonics  that  is 
k  5  n 

used.   To  save  writing,  let  z  ,  rather  than  z.   ,  denote  a  typical 

3  n  k  ,n  J r 

coefficient;   we   will    analyze   each    of    these    sequences    individually. 
Part   of   the    rationale    is    that    in    a    linear    uncoupled    system    the 
various    modes    move    independently;    in    a    loosely    coupled    system   we 
would   expect    similar    behavior.      There    are    other    important    reasons 
which   we   will    take    up    later.      We    shall     usually   consider    z      to    be   a 
pair   of   real    numbers,    a    two-by-one   matrix. 


Let  us  now  assume  that  z   can  be  expressed  as  a  sum  of  two 

n 


terms,  described  below, 


z   =  x   +  y 
n    n   J  n 


(3.1) 


Here  y   is  a  random  variable.   Its  components  each  are  assumed  to 
have  expected  value  zero  and  expected  square  a    ,  independent  of  n; 
they  are  also  assumed  to  be  uncorrel ated .   The  vector  x   is  assumed 
to  be  a  solution  to  a  difference  equation  of  the  form 


~1  n-1         "P""n-p 


Xn   =   ClXr,_l   +   •'•   +   Cr,Xn_n>     °   =   P   +   2>    '••»   N'       (3.2) 

id 


The  coefficients  C,  ,  ...,  C  are  unknown  two-by-two  matrices,  am 
the  order  p  is  to  be  chosen  some  way.  For  the  present  let  us  take 
p  =  2,  which  seems  at  this  time  to  be  a  likely  value.  The  C's  are 
not    constants    necessarily,    but   may    vary    adaptively   with    N. 

To    determine    the    coefficients    in    (3.2),    let    us    choose    the    C's 
to    minimize   a    residual    of   the    form 

N  p  2 


R    =   \  E    w    (z      -    E    C.z       .)       , 
2    ^      n      n        .j      j    n-j 


(3.3) 


where    the   w      are    positive   weight    factors    to    be   chosen   or    determined; 
n 

we   will    discuss    this    later.      To    minimize    R    let    us    differentiate   with 
respect    to    the    elements    of   the    C's;    we    get    a    system   of   equations    of 
the    form 


N 

E     W  ,2 

n   /    n-1    n-1 


N 
C,'\  -    =    wn  /znAzn 


n-1    n-  p  \  /    1 


z  '    i  >     •  •  •  »    z 


n-p    n-1 


n-  p    n-p 


(3.4) 


n-p, 


the  prime  (')  indicates  the  transpose  of  a  matrix.  The  terms  like 
zn-lzn-n  rePresent  two-by-two  matrices  so  that  the  elements  of  the 
matrices    are    themselves    matrices    of  order    two. 

Another    problem    is    the   choice    of   the   weight    factors    w    .      For 
a    time-independent    process    we    should    use   equal    weightings,    which 
simplifies    several    relations.      However    the    fundamental    relations 
may    vary    rather    rapidly   with    time    sometimes,    and    these   are    the    very 
times    that    are    most    critical.      One   way    to    take    account   of    this    is 
to    use   exponential    weightings    in    (3.3)    and    (3.4).      After    some 
starting    routine   we   weight    each    new    observation   with    a    uniform 
value   w   and    decrease    all    earlier   weightings    then   by    a    factor    1 -w . 
We    do    this    as    follows.      Let    QN    be    the   matrix   of   coefficients    in 
(3.4);    it    has    two-by-two    block    elements    of   the    form 


N 


Qi,J,N    =      I     wnzn-i2n-j'         iJ=1'     •••'    P; 


(3.5) 


the    right-hand    side    of    (3.4)    has    elements    Q .    n    N .      When   we    get    the 
observation    z  N  ,    we    update    Q .     .    N_ ,    by    the    relation 


Qi,j,N    =    Qi,j,N-l     +   w(zN-iZN-j    "    Qi  ,j,N-l  ) 


(3.6) 


We  may  actually  update  Q  by  first  shifting  each  block  element 
down  and  to  the  right  one  place,  dropping  out  the  last  row  and  the 
last  column.   Then  we  update  the  first  (block)  row,  using  (3.6), 
and  then  the  first  column  by  symmetry.   The  weight  of  an  observa- 
tion from  time  t   for  the  coefficients  calculated  at  time  tM  is 

n  N 

w(l-w)    "     ;    we    see    that   weightings    decrease    exponentially   with    time. 


The  best  choice  of  w  is  another  problem.  If  w  is  too  small 
the  system  is  slow  to  respond  to  changes,  and  if  w  is  too  large, 
random  errors  cause  excessive  errors  in  the  predicted  values. 

The  above  procedure  for  updating  the  coefficients  in  (3.2)  is 
rather  efficient,  both  in  terms  of  storage  and  computation.   The 
matrix  Q  is  always  singular  until  at  least  2p  observations  have 
been  made.   It  is  usually  positive  definite  after  that,  but  it  was 
sometimes  found  to  be  near  singular,  especially  for  the  smaller 
val ues  of  N . 

In  order  to  predict  the  z's  ahead,  using  (3.2),  we  also  need  p 

starting  or  initial  values;  we  can  take  these  concept ionally  as 

estimates  of  xM,  x   ,,...,  xM   ^,  .   The  determination  of  these  is 
N    n-1        N-p+1 

taken  up  in  the  next  section;  it  requires  more  storage  and  more 
computation  than  the  above. 

In  the  following  section  we  will  generally  consider  the  case 
P  =  2. 

There  may  be  some  minor  discrepancies  in  the  first  index;  for 
example,  when  the  a's  are  involved  we  tend  to  think  of  z^    as  the 
first  of  the  z's,  and  later,  when  treating  the  z's  we  think  of  the 
first  as  z ,  . 


4.    ROUTINE  TO  DETERMINE  THE  BEST  FILTERED  OR  INITIAL  VALUES 
FOR  PREDICTION 

Let  us  assume  that  we  have  found  the  coefficients  C.  in  the 
difference  Eq.  (3.2).   We  also  need  estimates  of  xN  and  xN  ,,  say 
XN'  *N-1  '  in  order  t0  Predict  ahead  using  the  difference  equation 
We  might  just  use  the  last  two  observations,  zN,  zN  ,,  but  these 
involve  individual  random  errors,  and  we  expect  smoothed  or 
filtered  values  to  be  more  reliable.   By  using  a  large  value  for 
the  weighting  parameter  w  we  can  drive  the  smoothed  values  close 
to  the  last  two  observations,  if  we  wish. 

Let  us  resolve  the  problem  as  follows.   Let  us  consider 

(n  =  1  ,  ...,  N) 
(n  =  3,  ...,  N) 

zn         (Cl[Clxn-2    +   C2Xn-3]    +    C2[Clxn-3    +    C2xn-4]) 


y      =    z      -    x 

J  n  n  n 

=    zn    -    (ClVl    +   C2xn-2> 


which   we    may    rewrite    in    the    form 
y«    =    z„    -    A    x9    -    B    x,     , 


n"2 


n  1 


(n  =  1 ,  ...,  N); 


(4.1) 


A   and  B   are  defined  by  this  relation.   It  is  easily  verified  that 
they  satisfy  the  relations 


1  n-1     2  n-2 


n  =  1  ,2,  .  .  .,  N, 


0,  A2  =  I, 


ClBn-l    +    C2Bn-2 
I,    B2    =    0. 


(4.2) 


We  can  simplify  the  programming  somewhat  if  we  define  the 
matrices 

Dn    =    (A    .BJ 


n      n 


(4.3) 


U  \X^j,         1'  *  ^  1  ?  '         v  ?  '      ^11'      ^  ?  1   '       ' 

x .  .    is    the    i'th    component   of   x..      Then    Dn    satisfies    Eq .    (3.2)    and 


has    the    initial    values 


Dn    =    C^""1     +    C2Dn"2 

D1     =    (0,     I),    D2    =    (I,    0)  . 


Equation    (4.1)    now    reduces    to 


(4.4) 


y      =    z      -    D    U 

J n  n 


(4.5) 

If  we    knew    the    four    components    of   U,    we   could    get    starting    values 
xN,    xN-1     from    Eq.    (3.2). 


xn        Clxn-1     +    C2xn-2' 


(4.6) 


and  if  we  select  or  estimate  U  in  some  way  we  get  corresponding 
estimates  for  x., ,  xN_,.   Of  course  it  would  be  nicer  to  determine 
these  in  a  more  direct  way,  but  there  seems  to  be  no  way  to  do 
this  . 

Now,  by  assumption,  the  expected  value,  E  y.  =  0.   Hence  it 
seems  reasonable  that  the  best  choice  for  U  is  one  which  minimizes 
a  weighted  sum  of  squares  for  the  estimates  of  y. ,  say, 


1  N     ?         l  N      n        2 
R=^-zwy=^Zw  (DnU  -  z  ) 

2  n-7  n    2    n        n 


(4.7) 


To    effect    this    we    differentiate   with    respect    to    the    components    of 
U;    if  we    call    its    components    u.,    the    j'th    partial    derivative   yields 

J 


Ro   ■ 


"   wnK,o(i    dl,kuk    "    zl.n>    +   d2,j<^    d2,kuk    "    z2,n>] 


=   0, 


(4.8) 


where  d.  k  are  the  components  D  .   We  solve  these  equations  for  U, 
which  defines  x-j,x2>  and  from  these  we  get  our  best  estimates, 
XN-1  ■  *N  for  s  tarti  ng  . 

There  are  several  problems.  First,  the  sequence  z  , 
n  =  1,  2  ...,  N,  may  be  very  long,  so  that  computing  times  and 
errors  may  be  significant.  Further  the  observations  z  for  small 
values  of  n  may  not  be  relevant  to  today's  weather.  We  also  need 
to  recall  many  observations.  These  problems  suggest  limiting  the 
range  of  summation  in  (4.7)  to,  say,  the  last  ten  terms,  which  we 
did. 

Another  problem  needs  some  explanation.   We  really  just  need 
xm  i  >  xm  t0  Predict  ahead.   However  we  cannot  get  these  directly. 
It  really  does  not  matter  whether  we  find  xN_,  and  x..,  or  xN_q 
and  X.,  o',  we  get  the  former  from  the  latter  by  use  of  difference 
Eq .  (3.2).   We  might  perhaps  try  to  rewrite  the  difference 
equation,  to  solve  for  x   «  in  terms  of  x   ,  and  x  ,  to  avoid 
this.   However  we  must  invert  the  matrix  C~  to  effect  this,  and 
there  is  no  reason  that  it  cannot  be  singular,  or  nearly  so, 
randomly  . 

In  the  program  we  have  15  x's  actually  indexed  from  1  to  15. 
The  first  10  of  these  are  the  10  approximants  to  the  last  10 
observations  zN  q,  ...,  zN;  the  last  five  are  predictions  for  the 
future  values  zN+1 ,  ...,  zN+5- 


5.         TWO    SUGGESTED    METHODS    FOR    IMPROVING    PREDICTION    BY    THE    USE    OF 
A    TIME    SERIES 

There    are    two    particular   methods    which    seem    feasible    for 
improving    the    forecast.      Each    of    these   seems    feasible,    but    each 
poses    some    probl ems  . 
5.1       Method    1 

Let    us    consider    again    the    matrices    for    the    coefficients    in    the 

analyzed    fields,    A    ,    n=l,     ...,    N;    we   wish    to    improve    the    predictions 

n 

of   these.      Let    us    assume    that   we    have    generated   and    stored    the 

following.      First,    we    have    stored    the    last    ten    corrections 

ZM    n,     ...,    Z...      We    have    also    stored,    for    each    term    in    the    Z's, 
N-9  N 

the    matrix    Q   which    defines    the    coefficients    C.    in    the    difference 
equation  . 

The    routine    is    the    following    for    each    component   of    Z.      As 
soon    as    the    new    value   of   Z,    Z.,,    is    obtained,    we    store    it    and 
discard    the    old    ZN_g.      Then   we    update    the    matrix    Q    and    solve    for 
the    coefficients    of    the    difference    equation   C.    as    described    in 
Section    3.      Then   we   solve    for    the    initial    values    Xm-i  >    x.,,    as 
described    in    Sect.    4,    and    finally   we    predict    ahead    to    get 

XN+1  '    xN  +  2'     *'"'    *N+M'    as    mar,y    as   we   wlsh.      These    define   the 
x  '  s  ,    or    X ' s  . 

Next    let    us      use    the    dynamic    equation    and    integrate    ahead    one 
time   step,    to    get   an    estimate  AN   N+-i  >   of  AN+1  .      Then   let,    say, 


AN,N+1         AN,N+1  ' 


and 


AN,N  +  1         AN,N+1     +   XN+1  * 


(5.1) 
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We  get  successive  values  in  a  similar  way.   When  we  have  A 


N,N  +  j 


we    integrate    to    obtain   AM    »,,.,-,,    a    first    estimate   of  AM    »',_,., 

N , N  +  j  +  1  N , N  +  j+1 


Then  we  add  X..^.^,  ,  to  get 
N  +  j+1 


AN, N  +  j+1  "  AN, N  +  j+1  +  XN  +  j+l  ' 


(5.2) 


With  the  initial  value  A.  N  =  AN>  the  sequence  of  predictions, 
or  forecasts,  is  defined. 

In  the  initial  part  of  the  study,  a  simple   integration 
routine  was  used  for  predicting,  A..  N+M>  a  model  using  the  non- 
divergent  barotropic  vorticity  equation.   When  this  model  is  used, 
the  method  of  correcting  above,  alternately  adjusting  and 
integrating  in  feasible  and  practical. 

When  a  more  accurate  and  complicated  weather  prediction 
model  is  used,  serious  difficulties  arise  with  this  method. 
Restarting  a  time  integration  after  rather  arbitrary  adjustments 
to  one  or  more  variables  may  generate  spurious  gravity  waves  which 
degrade  the  prediction  and  offset  the  adjustments.   To  eliminate 
this  difficulty  a  second  method  may  be  used,  in  which  a  separate 
series  is  generated  for  each  interval  of  prediction.   The  appro- 
priate term  is  used  for  each  interval  and  the  adjusted  functions 
are  never  integrated. 
5.2   Method  2 

There  is  another  method  which  gets  around  the  above  problem, 
at  the  expense  of  increasing  the  storage  and  computation  by  a 
factor  of  roughly  M,  if  we  use  M  different  intervals  of  prediction 
That  is,  if  we  wish,  say  to  adjust  the  12-,  24-,  and  36-hour 
forecasts  we  must,  generate  three  time  series. 
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Now    assume    that   we   wish    to    predict    ahead    M    steps.      At    the 
time    t^   we   will    have    the    predictions    based   on    integration, 


N,N+1'     '•"    AN5N+M    '    {AN,N+m}m  =  l, 


M 


(5.3) 


For  the  corrections  we  will  save  M  sequences  Z 


Zn  =  An  "  An-m,n'    m  =  1  •   '-  M'   n  =  m+1  •  •■■•  N- 

(5.4) 

Let    us    consider    any    particular    value   of   m.      To    generate    the    data 

for    the    time    series    we   must    store    each    time   the    uncorrected 

predictions    A„       .     .      Then    at    time    t    ^m   we   will    obtain    and    store 
n , n+m  n+m 

Cm    from    (5'4>- 

We  will  thus  have  M  time  series  to  be  analyzed.   Each  is 

analyzed  as  discussed  earlier.   In  this  case  however  we  will  use 

a  single  prediction  from  each  one:  from  Z  we  generate  the  single 

correction  for  m  time  steps  ahead.   If  we  denote  the  corrections 

by  X   the  modified  values  will  be 

J       n 


A  =   A  +    X 

n,n+m  n,n+m  n 


(5.5) 


This    method    has    the    advantage    that    the   modified    terms    are    not 
integrated.      It    has    the    disadvantage    that    it    requires    one    time 
series    for    each    value   of  m    used,    which    leads    to    larger    storage 
and    computational    requirements. 
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6.         COMMENTS 

The    problem    here    differs    from    the    most    common    applications    of 
time    series.      We    have   a    rather   small    amount   of   information   which 
tends    to    be   masked    by   a    large    random    element.      From    this    we   are 
trying    to    predict    transients,    or    trends,    for    relatively   short 
periods,    perhaps    one    to    ten    time    steps.      We    are    not    directly 
interested    in    the    smoothed    values,    since   we   are    not    concerned   with 
what    has    happened,    except    as    our    ability    to    fit    it    reflects    on   our 
ability    to    predict.      Our    feeling    is    that    the    variables    we    are 
predicting   may   change    rather    rapidly   so    that    the   weather    two   weeks 
ago    is    of   little    interest;    even    the    equation   which    governs    the 
behavior   may  well    have    changed.      We    cannot    really   make    use   of 
long-term   observations,    as    for   a    stable    system,    which    allow   more 
accurate    predictions. 

We    are    trying    to    predict    the    short-range   behavior   of   a    part 
of   a    non-linear    system    of   high    order    by    solutions    to    a    linear 
system   of   low   order.      Whether   we    try    to    do    this    by    a    difference 
equation   or    by   Taylor    series    depends    on    the    type    of   data,    and    the 
nature    of   the    expected    solutions.       In   our   case   both    the    form   of 
the    data    and    the    anticipated    periodic    properties    of   the    expected 
solutions    suggest    difference    equations. 

The    papers    by    Jones    (1963),    (1964)    particularly,    and    (1965) 
suggested    the    general    problem    here.      These    treat    more    the 
descriptive    and    the    long-range    forecasting    problem.      The    series 
are    considered    to    be    stationary    in    the    sense    that    all    observations 
are   weighted    equally;    several     formulas    are    then    simplified. 
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The    references    found   most    useful    in    this    study   were    Whittle 
(1963)    and    Gelb    (1974).      Recent    developments    in    the    use   of   spherical 
harmonics    in   meteorology    (see    GARP,    1974)    and    Fast    Fourier    Transforms 
make    the   methods    appear    feasible. 

The  method  is  clearly  that  of  least  squares.  It  is  also  a 
linear  regression  method;  all  of  the  equations  to  be  solved  are 
linear.  If  we  were  to  use,  say,  our  estimates  xN_,  ,  >L  in  a 
routine  to  try  to  improve  the  C's,  then  it  would  be  non  linear. 
The  solutions  to  the  difference  equations  are  basically  complex 
exponentials,  that  is,  a  combination  of  real  exponentials  and 
trigonometric    functions. 

There    are    a    number   of    points    that    should   be    discussed   or 
clarified. 
6 .1       Cons  i  s  tency 

It    should    be    pointed   out    that    even    for    a    stationary   system 

the    solutions    are    not    consistent,    as    follows.      Let    us    assume    that 

we    have    a    solution    to    a    difference    Eq .    (3.1)    to   which    random 

uncorrelated    terms    y    ,    each   with    expected    value    0    and    expected 

n 
2 
square    a      is    added.      Then    Eq  .    (3.4)    will     not    yield    the    desired 

coefficients    in    (3.2).      The   expected    elements    of    the   matrix   of 

coefficients    Q    in    (3.5),    from    (3.4)    will    be 

N  2 

Ql.J.N    =    Z    Wn(Vi    xn-j    +    6ij    a      V      i  'j=1  '     •••■    p;         (6J) 

here    I?    is    the    second-order    identity    matrix,    and    6    is    the    Kronecker 
6,    (=1    when    i=j    and    0   when    i^j).      That    is,    the    Q   matrices    have    an 

extra    term    a      on    the    main    diagonal.      This    suggests    that    if  we    have 

2 
a    measure    of   a      we    could    subtract    it    from    the   elements    on    the   main 
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diagonal  of  Q,  in  a  sort  of  negative  ridge-regression  scheme. 
This  is  very    risky  procedure  however,  since  it  drives  the  matrix 
of  coefficients  toward  singularity.   Difficulty  has  been 
encountered  several  times  because  this  matrix  Q  was  singular,  and 
the  danger  is  aggravated  when  exponential  weighting  with  a  large 
decay  rate  is  used. 
6 .2   Best  Order  for  the  Difference  Equation 

We  have  suggested  that  we  use  a  second-order  difference 
equation,  p  =  2.   Some  of  the  reasons  follow.   Let  us  consider  a 
system  which  is  time  independent,  and  assume  that  have  found  the 

correct  order  p  and  the  true  coefficients  C  in  the  difference 

2 
equations.   In  this  case  the  residuals  yield  an  estimate  for  a  . 

Now  consider  the  residual 

P  2 


N 
R  =   E 
P 


Z+1  wn{xn  +  *n  "  \    Cj(xn-j  +  *n-j)} 


=  S  wn(*n  "  E  Vn-j)   • 

since  the  x's  satisfy  (3.2).   The  expected  value  of  the  residual 
is  then 

E(R)  =  (I  wn)  (2  ♦  l    c?jk.)  a2, 

where    the    last    term    in    the    parenthese    denotes    the    sum   of   the 
squares    of   all    of    the    components    of   all    of   the   C's. 

Now    if   we    try    to    fit    the    solutions    to    a    second-order    system 
by    a    third-order   difference    equation,    in    the    absence   of   noise,    the 
matrix   of   coefficients    is    singular,    since    the    solutions    satisfy 
many    third-order    difference    equations.       If  we    have    a    little    noise 
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we  may  have  a  near-singular  matrix  and  have  large  values  for  the 

Ci  , j  k'  in  this  case  we  9et  a  large  expected  value  in  (6.1)  for 
the  residual.   Limited  numerical  checks  make  the  second-order 
equation  appear  satisfactory. 
6  .  3   Determining  Initial  Values 

There  was  no  particular  reason  for  choosing  ten  values  to  fit 
when  we  determine  the  starting  values  for  prediction.   It  was  felt 
that  six  would  be  the  smallest  number  to  be  considered,  that  fewer 
would  make  the  methods  vulnerable  to  random  errors  in  the  last 
terms.   Many  more  than  ten  could  cause  roundoff  errors. 
6.4   Criteria  for  Adjusting  Terms 

We  do  not  expect  to  adjust  or  correct  all  of  the  coefficients. 
For  some  the  errors  may  be  too  small  to  warrant  the  time  and 
trouble.   Other  coefficients  may  contain  such  random  elements  and 
change  so  unpredictably  that  trends  are  effectively  obscured.   The 
decision  to  adjust  or  correct  will  probably  depend  on  an  analysis 
of  similar  sequences  in  the  past.   There  are  two  basic  ways  to  do 
this,  one  based  on  a  long  sequence  of  past  data  (that  is,  on 
climatology),  and  the  other  on  recent  data  (corresponding  to  recent 
weather )  . 

We  may  analyze  a  long  sequence  for  a  coefficient,  perhaps  over 
a  period  of  years,  and  from  this  make  a  decision,  beforehand 
always   to  correct,  or  never,  in  the  program. 

An  alternate  way  is  to  see  how  well  the  smoothed  values  fit 
the  last  ten  corrections.   We  generate  these  values  naturally  in 
determining  the  starting  values  for  prediction,  so  this  entails 
little  computation.   We  may  consider,  for  example,  the  ratio 
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10 
1-25  Z   wn(zN_1Q 


10 


+  n  -  xn>  t\      V 


N-10+n  ' 


If  this  is  much  less  than  one  we  have  fitted  the  past  ten  values 
well,  and  if  it  exceeds  one  we  have  no  skill  in  fitting  the 
observations.   The  factor  1.25  adjusts  for  the  fact  that  we  could 
fit  two  terms  exactly  with  the  initial  conditions.   We  might 
correct  or  not  accordingly  as  this  number  was  below  or  above  some 
criterion,  say,  .75.   The  philosophy  is  that  if  we  can  fit  a 
sequence  of  observations  well,  then  we  can  predict  well.   (More 
logical  is  the  converse; if  we  cannot  fit  the  observed  values,  then 
we  cannot  predict  well.)   It  is  probably  not  worthwhile  to  use 
this  criterion  unless  we  usually  correct. 
6 .5   Criteria  for  Improvement 

In  the  numerical  study  we  used  two  principal  criteria  for 
improvement.   Both  of  these  compared  in  some  way  the  magnitude  of 
the  errors  in  prediction  with  adjustment  with  the  errors  generated 
without  the  adjustment.   In  the  first  criterion,  the  error  in  the 
predicted  values(s)  was  compared  with  the  rms  value  of  the  last 
ten  corrections  (without  adjustment).   This  figure  was  calculated 
as  the  predictions  were  under  way.   For  the  second  criterion,  a 
long  sequence  of  adjustments  were  calculated.   Then  the  rms  values 
of  the  errors  in  prediction  after  adjustment  was  compared  with  the 
rms  value  of  the  error  in  prediction  without  adjustment: 

~  1 2   i   i  2 

Z  Z    -  X     /  E  Z 

1  n    n  '    '  n  ' 

the  range  of  summation  of  n  is  over  each  term  for  which  a  correc- 
tion to  z  was  made.   These  measures  are  easily  made  in  the  machine, 
and  they  define  a  simple  norm  for  the  error  relation. 
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It  is  not  clear  these  norms  are  good  ones  from  a  meteorological 
point  of  view.  Very    likely  the  ultimate  test  will  rest  on  maps  and 
criteria  for  judging  them. 
6.6   Real  Coefficients 

Most  of  the  coefficients  are  complex.   The  one  associated  with 
the  modes  that  are  independent  of  longitude  (meri di anal  )  are  real 
and  hence  one  dimensional.   The  routines  developed  then  involve 
scalar  equations,  rather  than  two-dimensional  vectors.   The 
corresponding  spherical  harmonics  are  the  Legendre  polynomials, 
the  simplest  of  the  Legendre  functions. 

7.    RESULTS 

The  procedure  for  analyzing  the  time  series  was  checked  on 

several  sets  of  made-up  data,  and  on  two  coefficients  a«  and  a. 

1       2 
(of  P-,  and  P2),  using  a  difference  equation  of  second  order,  p  =  2. 

For  data  made  up  of  solutions  to  a  difference  equation,  with 
random  noise  added,  the  solution  could  be  recovered  if  the  noise 
was  not  too  large,  but  the  routine  did  not  perform  yery   well  when 
the  energy  in  the  noise  (rms  value)  was  larger  than  in  the  solution. 

For  meteorological  data,  on  a  string  of  60  pairs  of  correc- 
tions, the  predictions  x,,  to  x,r  led  to  a  definite  improvement  for 
one  coefficient.   The  best  value  for  the  weighting  w  of  the  new 
values  seemed  rather  small:   the  values  .025,  .05  and  .1  gave 
similar  results  and  were  better  than  larger  values.   The  numbers 
below  were  determined  as  follows.   We  considered  the  coefficient 
of  P  .   A  set  of  10  data  points  were  used  at  the  start  to  get 
coefficients  for  the  best  difference  equation  and  the  initial 
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values.   Five  values  were  predicted,  corresponding  to  the  12-hour 

correction  to  be  made  in  12  hours,  24  hours,  ...,  60  hours.   The 

error  in  predicting  these  was  then  calculated  and  normalized  by 

dividing  by  the  rms  of  the  10  original  data.   Then  the  time  index 

was  advanced  by  one  and  the  operation  repeated. 

The  numbers  in  row  one  are  the  rms  values  of  a   ,  -  a 

n  +  1     n ,n  +  l  ' 

normalized  by  the  rms  values  of  zn    Q,  z„  Q,  ...,  z  .   The  numbers 

n  ~  y        n  ~  o  n 

in    row   m    denote    the    rms    value    of   a    .      -    a    .      ,       .    ,    normalized    by 

n+m    n+m- I , n+m  ' 

the  rms  value  of  z   „,  ...,  z  .   Each  is  then  a  correction  to  be 
applied  after  integrating  one  time  step.   Each  entry  comes  from 
the  sum  of  46  terms  . 


No.  of 
Time  Steps 

1 
2 
3 
4 
5 


Wei  ght  w   .025     .05 


.6239 
.6370 
.6670 
.6573 
.651  3 


6257 
6401 
6751 
6698 
6636 


.1 

61  31 
6312 
6834 
6960 
6880 


.2 

.6404 
.6629 
.7599 
.8314 
.8496 


.4 

.81  38 

.8640 

1  .3095 

1  .5523 

1  .9109 


There  are  several  interesting  points  about  the  data.   First, 
the  results  from  the  small  values  of  w,  from  .025  to  .1  are  all 
comparable.   This  indicates  a  stable  trend:   when  w  =  .025  the 
first  of  the  10  data  points  had  a  weighting  almost  .8  of  the  last; 
they  are  weighted  nearly  equally.   For  w  =  .10  the  relative 
weighting  is  about  .39.   A  surprising  feature  is  that  all  five  of 
the  predictions  are  so  similar.   This  is  further  indication  of  a 
stable  trend,  a  large  "random"  element,  and  a  satisfactory  value 
of  p,  and  suggests  longer  predictions  are  feasible. 


19 


The  corrections  to  the  fourth  coefficient  were  not  predicted 
successfully;  the  error  in  the  predicted  values  was  consistently 
a  little  large  than  the  corrections.   However  various  features  of 
the  solutions  are  quite  similar  to  the  other  solutions;  the 
smaller  values  of  w  give  better  predictions  and  all  five  predicted 
values  are  similar  in  size,  with  later  predictions  often  anomalously 
having  smaller  error  than  the  first. 
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SYMBOLS:  DEFINITIONS,  AND  PLACES  WHERE  INTRODUCED 


Section   2 


tn'  tN 


k  ,n 


an(=ak,n> 


n  ,n 


Zn~An~An-l  ,n 


time  N  is  usually  associated  with  the  last 
or  next- to -last  observation. 

the  matrix,  or  vector,  of  coefficients  for 

the  analyzed  field  at  time  t  . 
j  n 

the  k'th  element  of  A  ,  k=l  ,  ...,  K.   Most 
of  the  a.    are  complex  numbers  and  are 
treated  like  two-dimensional  vectors. 

the  number  of  spherical  harmonics  used  to 
represent  a  function. 

a  typical  element  of  A  ,  written  without 
index  k  to  save  writing. 

the  value  predicted  for  A    at  time  t  . 
r  n-|  n 

This  represents  the  discrepancy  between  the 

analyzed  field  at  t   and  the  value  predicted 

J  n  v 

for  it  at  time  t   •,.   Also  called  a  correction 

n-1 


Section  3 


R 
w 
w 


a  typical  element  of  Z  . 

the  component  of  z   that  is  assumed  to  be 

Y  n 

random . 

the  component  of  z   that  is  assumed  to  satisfy 

v  n  ' 

a  difference  equation  (see  below). 

two  by  two  matrices  in  the  difference  equation 

x   =C,x   ,+•••+  C  x   „  . 
n     1  n-1        p  n- p 

weighted    sum    of    residual     errors. 

weighting    of    new    terms. 

the    weighting    of    term    associated   with    t,    at 

time    N>^n;W      decreases    with    N    for    a    fixed 

r      n  (^       \N-n 

val  ue    of    n:  w      =   w  ( 1  -w  J 
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Section    3    (continued) 
x 


n 


a  best  approximation  for  x  ,  made  at  some 
definite  time  t...   (It  might  also  have  been 
designated  by  xk  N  n>) 

the  vector,  or  matrix,  of  x  's. 


Section  4 


A  ,  B 
n    n 


matrices  defined  in  Eq.  (4.2),  used  in 

f  i  ndi  ng    x    ' s . 
a      n 


Dn    =    (An,Bn)         (see    Eqs  .    (4.3),    (4.4)) 


Xij 


the    i'th    component    of   x.,    i,j=l,2. 


'     1  ?        9?        1  1         1  ? 


Section    5 


N,N+m 
N,N+m 


z;  =  A     -  A 
n  n 

m 


an    optimal    predictor    for   AM^    ,    made    at    tM 

N  +  m  N 

na te   of  A.. 

obtaining   AN>N+m 


a    temporary    estimate   of  A..    N+_»    used    in 


n-m  ,n 


typical     elements    of    Z 


a    general    symbol     for    a    quadratic    residual 
to    be    minimized. 
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